YOLO, short for "You Only Look Once", revolutionizes object detection in computer vision by offering a fast and streamlined approach. It uniquely simplifies the detection process, treating it as a single-step task that combines object localization and classification. This innovative method allows YOLO to process images and videos in real-time with high accuracy. The latest version, YOLOv8, further enhances this technology, adeptly tackling challenges in object detection and image segmentation. YOLO's real-time capabilities make it a standout choice for applications needing rapid and precise object identification.
Leveraging YOLO's real-time detection capabilities, this project focuses on Traffic Density Estimation, a vital component in urban and traffic management. The goal is to count vehicles within a specific area in each frame to assess traffic density. This valuable data aids in identifying peak traffic periods, congested zones, and assists in urban planning. Through this project, we aim to develop a comprehensive toolset that provides detailed insights into traffic flow and patterns, enhancing traffic management and city planning strategies.
import warnings
warnings.filterwarnings('ignore')
!pip install ultralytics
# Import necessary libraries
import os
import shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import cv2
import yaml
from PIL import Image
from ultralytics import YOLO
from IPython.display import Video
# Path to the image file
image_path = '/kaggle/input/top-view-vehicle-detection-image-dataset/Vehicle_Detection_Image_Dataset/sample_image.jpg'
# Perform inference on the provided image(s)
results = model.predict(source=image_path,
imgsz=640,
verbose=False,# Resize image to 640x640 (the size pf images the model was trained on)
conf=0.5) # Confidence threshold: 50% (only detections above 50% confidence will be considered)
# Annotate and convert image to numpy array
sample_image = results[0].plot(line_width=2)
# Convert the color of the image from BGR to RGB for correct color representation in matplotlib
sample_image = cv2.cvtColor(sample_image, cv2.COLOR_BGR2RGB)
# Display annotated image
plt.figure(figsize=(20,15))
plt.imshow(sample_image)
plt.title('Detected Objects in Sample Image by the Pre-trained YOLOv8 Model on COCO Dataset', fontsize=20)
plt.axis('off')
plt.show()
In our sample image, the pre-trained model missed the detectable truck and car that were clearly visible. A model pre-trained on a dataset with a broad range of classes, like COCO's 80 different categories, may not perform as well on a specific subset of those categories due to the diversity of objects it has been trained to recognize. If we fine-tune this model on a specialized dataset that focuses solely on vehicles, it can learn to detect various types of vehicles more accurately. Fine-tuning on a vehicle-specific dataset allows the model to become more specialized, adjusting the weights to be more sensitive to features specific to vehicles. As a result, the model's mean Average Precision (mAP) for vehicle detection could improve because it's being optimized on a narrower, more relevant range of classes for our specific application. Fine-tuning also helps the model generalize better for vehicle detection tasks, potentially reducing false negatives (like missing a detectable truck) and improving overall detection performance.
# Define the dataset_path
dataset_path = '/kaggle/input/top-view-vehicle-detection-image-dataset/Vehicle_Detection_Image_Dataset'
# Set the path YAML file
yaml_file_path = os.path.join(dataset_path, 'data.yaml')
# Load and print the contents of the YAML file
with open(yaml_file_path, 'r') as file:
yaml_content = yaml.load(file, Loader = yaml.FullLoader)
print(yaml.dump(yaml_content, default_flow_style=False))
As I previously mentioned, the 'data.yaml' file is a key part of setting up our model. It points out where to find the training and validation images and tells the model that we're focusing on just one class, named 'Vehicle'. This file is essential for making sure our Ultralytics YOLOv8 model learns from our specific dataset, as it guides the model to understand exactly what it needs to look for and where.
# Set paths for trainig and validation images sets
trian_image_path = os.path.join(dataset_path, 'train', 'images')
validation_image_path = os.path.join(dataset_path, 'valid', 'images')
# Initialize counters for the number of images
num_trian_images = 0
num_valid_images = 0
# Initialize sets to hold the unique sizes of images
train_image_size = set()
valid_image_size = set()
# Check train images sizes and count
for filename in os.listdir(trian_image_path):
if filename.endswith('.jpg'):
num_trian_images += 1
image_path = os.path.join(trian_image_path, filename)
with Image.open(image_path) as img:
train_image_size.add(img.size)
# Check validation images sizes and count
for filename in os.listdir(validation_image_path):
if filename.endswith('.jpg'):
num_valid_images +=1
image_path = os.path.join(validation_image_path, filename)
with Image.open(image_path) as img:
valid_image_size.add(img.size)
# Print the resutls
print(f"Number of training images:, {num_trian_images}")
print(f"Number of validation images: {num_valid_images}")
# Check if all images in training set have the same size
if len(train_image_size) == 1:
print(f"All training images have the same size: {train_image_size.pop()}")
else:
print("Training images have verying sizes.")
# Check if all images is validation set have same size
if len(valid_image_size) == 1:
print(f"All validation images have the same size: {valid_image_size.pop()}")
else:
print("Validation images have verying sizes")
The dataset for our project consists of 536 training images and 90 validation images, all uniformly sized at 640x640 pixels. This size aligns with the benchmark standard for the YOLOv8 model, ensuring optimal accuracy and speed during model performance. The split ratio of approximately 85% for training and 15% for validation provides a substantial amount of data for model learning while retaining enough images for effective model validation.
# List all jpg images in the directory
image_files = [file for file in os.listdir(trian_image_path) if file.endswith('.jpg')]
# Select 8 images as eqal intervals
num_images = len(image_files)
selected_images = [image_files[i] for i in range(0, num_images, num_images // 8)]
# Create a 2x4 subplot
fig,axes = plt.subplots(2,4,figsize=(20,11))
# Display each of the selected images
for ax, img_file in zip(axes.ravel(), selected_images):
img_path =os.path.join(trian_image_path, img_file)
image = Image.open(img_path)
ax.imshow(image)
ax.axis('off')
plt.suptitle('Sample Images from Training Dataset',fontsize=20)
plt.tight_layout()
plt.show()
# Train the model on out custome dataset
results = model.train(
data = yaml_file_path,
epochs = 100,
imgsz=640,
device=0,
patience=50,
batch = 32,
optimizer='auto',
lr0 = 0.0001,
lrf = 0.1,
dropout=0.1)
Weights & Biases, known as wandb.ai, is an MLOps tool that works seamlessly with Ultralytics, including the YOLOv8 model. When we train our YOLOv8 model, wandb.ai helps to manage our machine learning experiments by monitoring the training process, logging important metrics, and saving outputs. It's like a dashboard where we can see how our model is learning, with all the details and visualizations to help us understand the training progress and results.
🔑 Providing the API Key
During the model training, an API key is required for Weights & Biases to track and store our model's data. We'll be prompted to sign up at https://wandb.ai/authorize to get this key. After signing up, we copy the API key provided and paste it back into our training setup. This key connects our training session to the Weights & Biases platform, allowing us to access all the great features for monitoring and evaluating our model.
# Define the path to the Directory
post_training_files_path ='/kaggle/working/runs/detect/train'
# list the files in the directory
!ls {post_training_files_path}
# Construct the path to the best model weights file using os.path.join
best_model_path = os.path.join(post_training_files_path, 'weights/best.pt')
# Load the best model weights into the YOLO model
best_model = YOLO(best_model_path)
# Validate the best model using the validation set with default parameters
metrics = best_model.val(split='val')
# Define a function to plot learning curves for loss values
def plot_learning_curve(df, train_loss_col, val_loss_col, title):
plt.figure(figsize=(12, 5))
sns.lineplot(data=df, x='epoch', y=train_loss_col, label='Train Loss', color='#141140', linestyle='-', linewidth=2)
sns.lineplot(data=df, x='epoch', y=val_loss_col, label='Validation Loss', color='orangered', linestyle='--', linewidth=2)
plt.title(title)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
# Create the full file path for 'results.csv' using the directory path and file name
results_csv_path = os.path.join(post_training_files_path, 'results.csv')
# Load the CSV file from the constructed path into a pandas DataFrame
df = pd.read_csv(results_csv_path)
# Remove any leading whitespace from the column names
df.columns = df.columns.str.strip()
# Plot the learning curves for each loss
plot_learning_curve(df, 'train/box_loss', 'val/box_loss', 'Box Loss Learning Curve')
plot_learning_curve(df, 'train/cls_loss', 'val/cls_loss', 'Classification Loss Learning Curve')
plot_learning_curve(df, 'train/dfl_loss', 'val/dfl_loss', 'Distribution Focal Loss Learning Curve')
# Interface on validationset
valid_image_path = os.path.join(dataset_path, 'valid','images')
# list all jpg images in the directory
image_files = [file for file in os.listdir(valid_image_path) if file.endswith('.jpg')]
# select 9 images at equal intervals
num_images = len(image_files)
selected_images = [image_files[i] for i in range(0, num_images, num_images // 9)]
# Initailize the subplot
fig, axes = plt.subplots(3,3,figsize=(20,21))
plt.title('Validattion set inferences', fontsize=24)
# Perform inferences on each selected image and display it
for i, ax, in enumerate(axes.flatten()):
image_path = os.path.join(valid_image_path, selected_images[i])
results = best_model.predict(source = image_path, imgsz = 640, conf=0.5)
annotated_image = results[0].plot(line_width=1)
annotated_image_rgb =cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB)
ax.imshow(annotated_image_rgb)
ax.axis('off')
plt.tight_layout()
plt.show()
sample_image_path = '/kaggle/input/top-view-vehicle-detection-image-dataset/Vehicle_Detection_Image_Dataset/sample_image.jpg'
# Perform inference on the provided image using best model
results = best_model.predict(source=sample_image_path, imgsz=640,conf=0.7)
# Annotate and convert image to numpy array
smaple_image = results[0].plot(line_width=2)
# convert the color of the image from BGR to RGB for correct color representation in matplotlib
sample_image = cv2.cvtColor(sample_image, cv2.COLOR_BGR2RGB)
# Display annotated image
plt.figure(figsize=(20,15))
plt.imshow(sample_image)
plt.title('Detected Objects in Sample Image by the Final-tune YOLOv8 Model', fontsize=20)
plt.axis('off')
plt.show()
# Define the path to the sample video in the dataset
dataset_video_path = '/kaggle/input/top-view-vehicle-detection-image-dataset/Vehicle_Detection_Image_Dataset/sample_video.mp4'
# Define the destination path in the working directory
video_path = '/kaggle/working/sample_video.mp4'
# Copy the video file from its original location in the dataset to the current working directory in Kaggle for further processing
shutil.copyfile(dataset_video_path, video_path)
# Initiate vehicle detection on the sample video using the best performing model and save the output
best_model.predict(source=video_path, save=True)
# Convert the .avi video generated by the YOLOv8 prediction to .mp4 format for compatibility with notebook display
!ffmpeg -y -loglevel panic -i /kaggle/working/runs/detect/predict/sample_video.avi processed_sample_video.mp4
# Embed and display the processed sample video within the notebook
Video("processed_sample_video.mp4", embed=True, width=960)
# Define the threshold for considering traffic as heavy
heavy_traffic_threshold = 10
# Define the vertices for the quadrilaterals
vertices1 = np.array([(465, 350), (609, 350), (510, 630), (2, 630)], dtype=np.int32)
vertices2 = np.array([(678, 350), (815, 350), (1203, 630), (743, 630)], dtype=np.int32)
# Define the vertical range for the slice and lane threshold
x1, x2 = 325, 635
lane_threshold = 609
# Define the positions for the text annotations on the image
text_position_left_lane = (10, 50)
text_position_right_lane = (820, 50)
intensity_position_left_lane = (10, 100)
intensity_position_right_lane = (820, 100)
# Define font, scale, and colors for the annotations
font = cv2.FONT_HERSHEY_SIMPLEX
font_scale = 1
font_color = (255, 255, 255) # White color for text
background_color = (0, 0, 255) # Red background for text
# Open the video
cap = cv2.VideoCapture('sample_video.mp4')
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('traffic_density_analysis.avi', fourcc, 20.0, (int(cap.get(3)), int(cap.get(4))))
# Read until video is completed
while cap.isOpened():
# Capture frame-by-frame
ret, frame = cap.read()
if ret:
# Create a copy of the original frame to modify
detection_frame = frame.copy()
# Black out the regions outside the specified vertical range
detection_frame[:x1, :] = 0 # Black out from top to x1
detection_frame[x2:, :] = 0 # Black out from x2 to the bottom of the frame
# Perform inference on the modified frame
results = best_model.predict(detection_frame, imgsz=640, conf=0.4)
processed_frame = results[0].plot(line_width=1)
# Restore the original top and bottom parts of the frame
processed_frame[:x1, :] = frame[:x1, :].copy()
processed_frame[x2:, :] = frame[x2:, :].copy()
# Draw the quadrilaterals on the processed frame
cv2.polylines(processed_frame, [vertices1], isClosed=True, color=(0, 255, 0), thickness=2)
cv2.polylines(processed_frame, [vertices2], isClosed=True, color=(255, 0, 0), thickness=2)
# Retrieve the bounding boxes from the results
bounding_boxes = results[0].boxes
# Initialize counters for vehicles in each lane
vehicles_in_left_lane = 0
vehicles_in_right_lane = 0
# Loop through each bounding box to count vehicles in each lane
for box in bounding_boxes.xyxy:
# Check if the vehicle is in the left lane based on the x-coordinate of the bounding box
if box[0] < lane_threshold:
vehicles_in_left_lane += 1
else:
vehicles_in_right_lane += 1
# Determine the traffic intensity for the left lane
traffic_intensity_left = "Heavy" if vehicles_in_left_lane > heavy_traffic_threshold else "Smooth"
# Determine the traffic intensity for the right lane
traffic_intensity_right = "Heavy" if vehicles_in_right_lane > heavy_traffic_threshold else "Smooth"
# Add a background rectangle for the left lane vehicle count
cv2.rectangle(processed_frame, (text_position_left_lane[0]-10, text_position_left_lane[1] - 25),
(text_position_left_lane[0] + 460, text_position_left_lane[1] + 10), background_color, -1)
# Add the vehicle count text on top of the rectangle for the left lane
cv2.putText(processed_frame, f'Vehicles in Left Lane: {vehicles_in_left_lane}', text_position_left_lane,
font, font_scale, font_color, 2, cv2.LINE_AA)
# Add a background rectangle for the left lane traffic intensity
cv2.rectangle(processed_frame, (intensity_position_left_lane[0]-10, intensity_position_left_lane[1] - 25),
(intensity_position_left_lane[0] + 460, intensity_position_left_lane[1] + 10), background_color, -1)
# Add the traffic intensity text on top of the rectangle for the left lane
cv2.putText(processed_frame, f'Traffic Intensity: {traffic_intensity_left}', intensity_position_left_lane,
font, font_scale, font_color, 2, cv2.LINE_AA)
# Add a background rectangle for the right lane vehicle count
cv2.rectangle(processed_frame, (text_position_right_lane[0]-10, text_position_right_lane[1] - 25),
(text_position_right_lane[0] + 460, text_position_right_lane[1] + 10), background_color, -1)
# Add the vehicle count text on top of the rectangle for the right lane
cv2.putText(processed_frame, f'Vehicles in Right Lane: {vehicles_in_right_lane}', text_position_right_lane,
font, font_scale, font_color, 2, cv2.LINE_AA)
# Add a background rectangle for the right lane traffic intensity
cv2.rectangle(processed_frame, (intensity_position_right_lane[0]-10, intensity_position_right_lane[1] - 25),
(intensity_position_right_lane[0] + 460, intensity_position_right_lane[1] + 10), background_color, -1)
# Add the traffic intensity text on top of the rectangle for the right lane
cv2.putText(processed_frame, f'Traffic Intensity: {traffic_intensity_right}', intensity_position_right_lane,
font, font_scale, font_color, 2, cv2.LINE_AA)
# Write the processed frame to the output video
out.write(processed_frame)
# Uncomment the following 3 lines if running this code on a local machine to view the real-time processing results
# cv2.imshow('Real-time Analysis', processed_frame)
# if cv2.waitKey(1) & 0xFF == ord('q'): # Press Q on keyboard to exit the loop
# break
else:
break
# Release the video capture and video write objects
cap.release()
out.release()
# Close all the frames
# cv2.destroyAllWindows()
# Convert the .avi video generated by our traffic density estimation app to .mp4 format for compatibility with notebook display
!ffmpeg -y -loglevel panic -i /kaggle/working/traffic_density_analysis.avi traffic_density_analysis.mp4
# Embed and display the processed sample video within the notebook
Video("traffic_density_analysis.mp4", embed=True, width=960)